[Light] synchronize cht pruning with babe pruning #6851

cheme · 2020-08-07T19:52:41Z

Light client Cht pruning on polkadot is erasing headers that babe requires during its epoch pruning.

The header cache allows thing to work, except if the node get shutdown and restarted, then the db misses some header and the chain get bricked.

This PR create a SharedPruningRequirements struct that is designed to be passed around client component and to contain db pruning relative constraints.

Then the CHT pruning (when babe set need_mapping_for_light_pruning of the shared pruning requirement to true) will not
prune the number to key lookup for the canonical header.
That is babe that later will prune it through its fork tree epoch pruning.

For this PR the only constraint is a finalized block height to keep untouched, it is set by babe depending on its stored epochs and locks the cht pruning of the light client headers.
This means that in babe we update this info after epoch pruning, and in the light client limit cht pruning and when limit applies we need to buffered the pending pruning ranges.

Note that someone with better grasp of ForkTree and babe EpochChange may be able to do thin the needed_parent_relation range.

At first I tried to do things in a simplier/quickier way, but none of my attempt where good, but there may be a more straight forward solution that do not rely on asumptions (would be better that this pr that adds code).

Failed previous attempts:

- use default missing header to not being descendant of the node, or other default behavior for missing node. -> does not work well, actually break fork tree. - keep a treshold amount of block for cht pruning (but it does not really work as we do not really know how long before next babe pruning since finalisation can stall). - do babe pruning more often and before cht pruning. -> does not work well with polkadot where slot len (2400) > pruning

bkchr · 2020-08-07T19:54:13Z

client/api/src/backend.rs

+#[derive(Clone)]
+/// Pruning requirement to share between multiple client component.
+///
+/// This allows pruning related synchronisation. For instance in light
+/// client we need to synchronize header pruning from CHT (every N blocks)
+/// with the pruning from consensus used (babe for instance require that
+/// its epoch headers are not pruned which works as long as the slot length
+/// is less than the CHT pruning window.
+/// Each compenent register at a given index (call are done by this order).
+///
+/// Note that this struct could be split in two different struct (provider without
+/// component and Component), depending on future usage of this shared info.


Suggested change

#[derive(Clone)]

/// Pruning requirement to share between multiple client component.

///

/// This allows pruning related synchronisation. For instance in light

/// client we need to synchronize header pruning from CHT (every N blocks)

/// with the pruning from consensus used (babe for instance require that

/// its epoch headers are not pruned which works as long as the slot length

/// is less than the CHT pruning window.

/// Each compenent register at a given index (call are done by this order).

///

/// Note that this struct could be split in two different struct (provider without

/// component and Component), depending on future usage of this shared info.

/// Pruning requirement to share between multiple client component.

///

/// This allows pruning related synchronisation. For instance in light

/// client we need to synchronize header pruning from CHT (every N blocks)

/// with the pruning from consensus used (babe for instance require that

/// its epoch headers are not pruned which works as long as the slot length

/// is less than the CHT pruning window.

/// Each compenent register at a given index (call are done by this order).

///

/// Note that this struct could be split in two different struct (provider without

/// component and Component), depending on future usage of this shared info.

#[derive(Clone)]

bkchr · 2020-08-07T19:55:36Z

client/api/src/backend.rs

+#[derive(Eq, PartialEq)]
+/// Define a block number limit to apply.


Suggested change

#[derive(Eq, PartialEq)]

/// Define a block number limit to apply.

/// Define a block number limit to apply.

#[derive(Eq, PartialEq)]

too but would need to use traits). Removed checked optimization.

cheme · 2020-08-17T09:22:36Z

I did simplify code a bit, I can still use traits to make instantiation a bit more safer, I could also drop support for other source of constraint and have a non generic code.
Also maybe we do not need to keep multiple range of pending finalize but only a single aggregated range.
But at this point I am still wondering if there might be some better way to fix that (avoid babe querying pruned headers) cc\ @svyatonik ?

svyatonik · 2020-08-17T10:43:21Z

client/api/src/backend.rs

+pub enum PruningLimit<N> {
+	/// Ignore.
+	None,
+	/// The component require at least this number


Iiuc from the code, N here is actually number of oldest block that shouldn't be pruned. If I'm right, then the comment looks wrong - i.e. some component may need last 1024 headers, but we may be at block 1_000_000 => N would actually be 1_000_000 - 1024

svyatonik · 2020-08-17T10:52:15Z

client/db/src/light.rs

 	cache: Arc<DbCacheSync<Block>>,
 	header_metadata_cache: Arc<HeaderMetadataCache<Block>>,
+	shared_pruning_requirements: SharedPruningRequirementsSource<Block>,
+	pending_cht_pruning: RwLock<VecDeque<(NumberFor<Block>, NumberFor<Block>)>>,


Actually we may only store number of the oldest+newest block we want to prune. I.e. instead of vec![(1, 2048), (2049, 4096), (4097, 5120), ..., (5121, 8192)] we may only store two numbers - 1 and 8192 :)

svyatonik · 2020-08-17T11:40:20Z

Re simpler option - I haven't finished reviewing yet + I only have shallow knowledge of epochs + pruning. But would it be enough to know if header H (associated with epoch E) is canonical/non-canonical to determine if we need to prune the entry E in the fork tree? If so, we may tune header pruning a bit so that number => hash won't be pruned for canonical headers - i.e. change remove_key_mappings call to this.

One last addition - iiuc the BABE fails to import block if pruning fails, right? How critical is pruning for BABE? If it isn't critical, then probably it is better just to log the issue, not to fail the process?

cheme · 2020-08-17T13:00:07Z

Re simpler option - I haven't finished reviewing yet + I only have shallow knowledge of epochs + pruning. But would it be enough to know if header H (associated with epoch E) is canonical/non-canonical to determine if we need to prune the entry E in the fork tree?

I wonder if knowing if H is canonical/non-canonical is actually the operation in fork tree that require a to read cht pruned headers (basically it fails when looking for common ancestor of two epoch fork tree roots).
IIIRC the missing header occurs at

substrate/utils/fork-tree/src/lib.rs

Line 147 in 488b7c7

child.number < *number && is_descendent_of(&child.hash, hash).unwrap_or(false))

on 'is_descendant' call.

If so, we may tune header pruning a bit so that number => hash won't be pruned for canonical headers - i.e. change remove_key_mappings call to this.

I am not sure I understand this correctly, but yes I feel like there may be something to do in babe to avoid querying old headers, but my attempts at doing so were not very successful.

BABE fails to import block if pruning fails, right? How critical is pruning for BABE? If it isn't critical, then probably it is better just to log the issue, not to fail the process?

Seems like a good idea to me.
It still means that babe will never get pruned but it is probably better to be able to use the light client for a while before resynching it.
In fact if we don't exit the light client during synch, the issue never occurs thanks to the headers cache.

This reverts commit bf40274. Allowing failure on a PR that try to avoid failure seems awkward.

svyatonik · 2020-08-18T06:31:50Z

My idea is to prune all fork tree nodes that correspond to blocks with number < best_finalized_number and that are non-canonical (this is why we need to leave number => hash mapping in light storage after pruning header). Their children must be pruned too. Maybe some new ForkTree method? ForkTree::iterate_prune(predicate)?

So basically you start with roots - if some roots have number < last_finalized and they're non-canonical (doesn't require header in storage), then you prune it + its children. Then you proceed to selected root children && only leave canonical again. Continue until you'll find that all roots must be kept. Does it makes sense?

cheme · 2020-08-18T07:26:34Z

I see, keep a KeyLookup mapping to resolve the fork tree pruning (remove non-canonical roots without having to calculate a common parent through headers).
Then the pruning can use the fork tree information to remove those lookups (through fork tree children relation).
Here we can code the thing without a 'babe-light_db' dependency from a babe perspective : you only need to let babe be aware it needs to delete mappings on pruning which is a simple configuration to use properly when running in light client.
From 'light-db' (cht pruning) perspective you also need a configuration to skip pruning lookup (from number key), this configuration depends on the consensus used, so we still need some SharedPruningRequirement, but it does not have to share memory and is only use when instantiating in light client (to ensure correct configuration is use).

So in this case SharedPruningRequirement just say: one of earlier component ('babe') did take responsability to prune non-canonical 'from number' key lookup so CHT do not need it.

If keeping shared memory we could also simply store those fork tree roots in the SharedPruningRequirement and calculate their new state before cht pruning. But that is only too optimize the db footprint and is probably not worth it.

I will try the first solution.

svyatonik · 2020-08-18T07:32:33Z

From 'light-db' (cht pruning) perspective you also need a configuration to skip pruning lookup (from number key), this configuration depends on the consensus used, so we still need some SharedPruningRequirement, but it does not have to share memory and is only use when instantiating in light client (to ensure correct configuration is use).

Probably better to leave this info in light-db forever - it'll be easier (unless it breaks anything else). Maybe optimization for follow-up PRs? Not sure? But I'm not insisting :)

something less technical.

This reverts commit e9255a2.

cheme · 2020-08-18T14:34:33Z

seems like synch it is working ok with new code (the number of key lookup sometime peeks a lot but seems to get down close to the number of headers sometimes.
Code will need some refactoring before being reviewable again.

This reverts commit e9255a2.

gnunicorn · 2020-10-19T09:08:51Z

Closed due to it being stale for while.

synch babe headers need with cht pruning

9778963

cheme added A3-in_progress Pull request is in progress. No review needed at this stage. B3-apinoteworthy labels Aug 7, 2020

cheme requested a review from andresilva as a code owner August 7, 2020 19:52

bkchr reviewed Aug 7, 2020

View reviewed changes

cheme added 3 commits August 17, 2020 09:49

Put derive under comment blocks.

8fe1bea

Merge branch 'master' into synch_babe_cht

8c24de8

Safer api (no more optional component, option could be remove from babe

cfea1f1

too but would need to use traits). Removed checked optimization.

cheme added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Aug 17, 2020

unneeded change

4ad7e18

svyatonik reviewed Aug 17, 2020

View reviewed changes

switch to array of range to single range.

c3b6ef0

cheme added 2 commits August 17, 2020 15:12

Only log error on pruning failure in babe.

bf40274

Revert "Only log error on pruning failure in babe."

0a8f0bf

This reverts commit bf40274. Allowing failure on a PR that try to avoid failure seems awkward.

cheme added A3-in_progress Pull request is in progress. No review needed at this stage. and removed A0-please_review Pull request needs code review. labels Aug 18, 2020

cheme added 4 commits August 18, 2020 15:33

alternative implementation, HeaderLookup trait need to be redefine to

fcd1410

something less technical.

Merge branch 'master' into synch_babe_cht

4ebf7e3

temporary code

e9255a2

Revert "temporary code"

34712ab

This reverts commit e9255a2.

cheme added 5 commits August 19, 2020 11:44

Add missing check

cbad9b0

temporary code

8701952

Revert "temporary code"

689fb6e

This reverts commit e9255a2.

Move HeaderLookupBackend methods to HeaderBackend

2e0d122

Rename methods.

ec15ed9

cheme requested a review from mxinden as a code owner August 19, 2020 10:26

Factor duplicated code.

07cf6dd

cheme added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Aug 19, 2020

gnunicorn added A1-onice and removed A0-please_review Pull request needs code review. labels Sep 9, 2020

cheme added 2 commits September 9, 2020 11:36

Merge branch 'master' into synch_babe_cht

2e44ebe

Break some long lines.

079eecd

cheme added the C1-low PR touches the given topic and has a low impact on builders. label Sep 9, 2020

Merge branch 'master' into synch_babe_cht

cb0130d

cheme mentioned this pull request Oct 1, 2020

Add missing fields to the light sync state #7225

Merged

gnunicorn closed this Oct 19, 2020

		#[derive(Eq, PartialEq)]
		/// Define a block number limit to apply.

[Light] synchronize cht pruning with babe pruning #6851

[Light] synchronize cht pruning with babe pruning #6851

Uh oh!

Conversation

cheme commented Aug 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bkchr Aug 7, 2020

Choose a reason for hiding this comment

Uh oh!

bkchr Aug 7, 2020

Choose a reason for hiding this comment

Uh oh!

cheme commented Aug 17, 2020

Uh oh!

svyatonik Aug 17, 2020

Choose a reason for hiding this comment

Uh oh!

svyatonik Aug 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

svyatonik commented Aug 17, 2020

Uh oh!

cheme commented Aug 17, 2020

Uh oh!

svyatonik commented Aug 18, 2020

Uh oh!

cheme commented Aug 18, 2020

Uh oh!

svyatonik commented Aug 18, 2020

Uh oh!

cheme commented Aug 18, 2020

Uh oh!

gnunicorn commented Oct 19, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cheme commented Aug 7, 2020 •

edited

Loading

svyatonik Aug 17, 2020 •

edited

Loading